588 research outputs found

    Credit Risk Scoring: A Stacking Generalization Approach

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Risk Analysis and ManagementCredit risk regulation has been receiving tremendous attention, as a result of the effects of the latest global financial crisis. According to the developments made in the Internal Rating Based approach, under the Basel guidelines, banks are allowed to use internal risk measures as key drivers to assess the possibility to grant a loan to an applicant. Credit scoring is a statistical approach used for evaluating potential loan applications in both financial and banking institutions. When applying for a loan, an applicant must fill out an application form detailing its characteristics (e.g., income, marital status, and loan purpose) that will serve as contributions to a credit scoring model which produces a score that is used to determine whether a loan should be granted or not. This enables faster and consistent credit approvals and the reduction of bad debt. Currently, many machine learning and statistical approaches such as logistic regression and tree-based algorithms have been used individually for credit scoring models. Newer contemporary machine learning techniques can outperform classic methods by simply combining models. This dissertation intends to be an empirical study on a publicly available bank loan dataset to study banking loan default, using ensemble-based techniques to increase model robustness and predictive power. The proposed ensemble method is based on stacking generalization an extension of various preceding studies that used different techniques to further enhance the model predictive capabilities. The results show that combining different models provides a great deal of flexibility to credit scoring models

    Data Science for Finance: Targeted Learning from (Big) Data to Economic Stability and Financial Risk Management

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsThe modelling, measurement, and management of systemic financial stability remains a critical issue in most countries. Policymakers, regulators, and managers depend on complex models for financial stability and risk management. The models are compelled to be robust, realistic, and consistent with all relevant available data. This requires great data disclosure, which is deemed to have the highest quality standards. However, stressed situations, financial crises, and pandemics are the source of many new risks with new requirements such as new data sources and different models. This dissertation aims to show the data quality challenges of high-risk situations such as pandemics or economic crisis and it try to theorize the new machine learning models for predictive and longitudes time series models. In the first study (Chapter Two) we analyzed and compared the quality of official datasets available for COVID-19 as a best practice for a recent high-risk situation with dramatic effects on financial stability. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organizations based on the value of systematic measurement errors. We combined excel files, text mining techniques, and manual data entries to extract the COVID-19 data from official reports and to generate an accurate profile for comparisons. The findings show noticeable and increasing measurement errors in the three datasets as the pandemic outbreak expanded and more countries contributed data for the official repositories, raising data comparability concerns and pointing to the need for better coordination and harmonized statistical methods. The study offers a COVID-19 combined dataset and dashboard with minimum systematic measurement errors and valuable insights into the potential problems in using databanks without carefully examining the metadata and additional documentation that describe the overall context of data. In the second study (Chapter Three) we discussed credit risk as the most significant source of risk in banking as one of the most important sectors of financial institutions. We proposed a new machine learning approach for online credit scoring which is enough conservative and robust for unstable and high-risk situations. This Chapter is aimed at the case of credit scoring in risk management and presents a novel method to be used for the default prediction of high-risk branches or customers. This study uses the Kruskal-Wallis non-parametric statistic to form a conservative credit-scoring model and to study its impact on modeling performance on the benefit of the credit provider. The findings show that the new credit scoring methodology represents a reasonable coefficient of determination and a very low false-negative rate. It is computationally less expensive with high accuracy with around 18% improvement in Recall/Sensitivity. Because of the recent perspective of continued credit/behavior scoring, our study suggests using this credit score for non-traditional data sources for online loan providers to allow them to study and reveal changes in client behavior over time and choose the reliable unbanked customers, based on their application data. This is the first study that develops an online non-parametric credit scoring system, which can reselect effective features automatically for continued credit evaluation and weigh them out by their level of contribution with a good diagnostic ability. In the third study (Chapter Four) we focus on the financial stability challenges faced by insurance companies and pension schemes when managing systematic (undiversifiable) mortality and longevity risk. For this purpose, we first developed a new ensemble learning strategy for panel time-series forecasting and studied its applications to tracking respiratory disease excess mortality during the COVID-19 pandemic. The layered learning approach is a solution related to ensemble learning to address a given predictive task by different predictive models when direct mapping from inputs to outputs is not accurate. We adopt a layered learning approach to an ensemble learning strategy to solve the predictive tasks with improved predictive performance and take advantage of multiple learning processes into an ensemble model. In this proposed strategy, the appropriate holdout for each model is specified individually. Additionally, the models in the ensemble are selected by a proposed selection approach to be combined dynamically based on their predictive performance. It provides a high-performance ensemble model to automatically cope with the different kinds of time series for each panel member. For the experimental section, we studied more than twelve thousand observations in a portfolio of 61-time series (countries) of reported respiratory disease deaths with monthly sampling frequency to show the amount of improvement in predictive performance. We then compare each country’s forecasts of respiratory disease deaths generated by our model with the corresponding COVID-19 deaths in 2020. The results of this large set of experiments show that the accuracy of the ensemble model is improved noticeably by using different holdouts for different contributed time series methods based on the proposed model selection method. These improved time series models provide us proper forecasting of respiratory disease deaths for each country, exhibiting high correlation (0.94) with Covid-19 deaths in 2020. In the fourth study (Chapter Five) we used the new ensemble learning approach for time series modeling, discussed in the previous Chapter, accompany by K-means clustering for forecasting life tables in COVID-19 times. Stochastic mortality modeling plays a critical role in public pension design, population and public health projections, and in the design, pricing, and risk management of life insurance contracts and longevity-linked securities. There is no general method to forecast the mortality rate applicable to all situations especially for unusual years such as the COVID-19 pandemic. In this Chapter, we investigate the feasibility of using an ensemble of traditional and machine learning time series methods to empower forecasts of age-specific mortality rates for groups of countries that share common longevity trends. We use Generalized Age-Period-Cohort stochastic mortality models to capture age and period effects, apply K-means clustering to time series to group countries following common longevity trends, and use ensemble learning to forecast life expectancy and annuity prices by age and sex. To calibrate models, we use data for 14 European countries from 1960 to 2018. The results show that the ensemble method presents the best robust results overall with minimum RMSE in the presence of structural changes in the shape of time series at the time of COVID-19. In this dissertation’s conclusions (Chapter Six), we provide more detailed insights about the overall contributions of this dissertation on the financial stability and risk management by data science, opportunities, limitations, and avenues for future research about the application of data science in finance and economy

    A Comprehensive Survey on Enterprise Financial Risk Analysis: Problems, Methods, Spotlights and Applications

    Full text link
    Enterprise financial risk analysis aims at predicting the enterprises' future financial risk.Due to the wide application, enterprise financial risk analysis has always been a core research issue in finance. Although there are already some valuable and impressive surveys on risk management, these surveys introduce approaches in a relatively isolated way and lack the recent advances in enterprise financial risk analysis. Due to the rapid expansion of the enterprise financial risk analysis, especially from the computer science and big data perspective, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing enterprise financial risk researches, as well as to summarize and interpret the mechanisms and the strategies of enterprise financial risk analysis in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. This paper provides a systematic literature review of over 300 articles published on enterprise risk analysis modelling over a 50-year period, 1968 to 2022. We first introduce the formal definition of enterprise risk as well as the related concepts. Then, we categorized the representative works in terms of risk type and summarized the three aspects of risk analysis. Finally, we compared the analysis methods used to model the enterprise financial risk. Our goal is to clarify current cutting-edge research and its possible future directions to model enterprise risk, aiming to fully understand the mechanisms of enterprise risk communication and influence and its application on corporate governance, financial institution and government regulation

    Managing credit risk and the cost of equity with machine learning techniques

    Get PDF
    Credit risks and the cost of equity can influence market participants' activities in many ways. Providing in-depth analysis can help participants reduce potential costs and make profitable strategies. This kind of study is usually armed with conventional statistical models built with researchers' knowledge. However, with the advancement of technology, a massive amount of financial data increasing in volume, subjectivity, and heterogeneity becomes challenging to process conventionally. Machine learning (ML) techniques have been utilised to handle this difficulty in real-life applications. This PhD thesis consists of three major empirical essays. We employ state-of-art machine learning techniques to predict peer-to-peer (P2P) lending default risk, P2P lending decisions, and Environmental, Social, Corporate Governance (ESG) effects on firms' cost of equity. In the era of financial technology, P2P lending has gained considerable attention among academics and market participants. In the first essay (Chapter 2), we investigate the determinants of P2P lending default prediction in relation to borrowers' characteristics and credit history. Applying machine learning techniques, we document substantial predictive ability compared with the benchmark logit model. Further, we find that the LightGBM has superior predictive power and outperforms all other models in all out-of-sample predictions. Finally, we offer insights into different levels of uncertainty in P2P loan groups and the value of machine learning in credit risk mitigation of P2P loan providers. Macroeconomic impact on funding decisions or lending standards reflects the risk-taking behaviour of market participants. It has been widely discussed by academics. But in the era of financial technology, it leaves a gap in the evidence of lending standards change in a FinTech nonbank financial organisation. The second essay (Chapter 3) aims to fill the gap by introducing loan-level and macroeconomic variables into the predictive models to estimate the P2P loan funding decision. Over 12 million empirical instances are under study while big data techniques, including text mining and five state-of-the-art approaches, are utilised. We note that macroeconomic condition affects individual risk-taking and reaching-for-yield behaviour. Finally, we offer insight into macroeconomic impact in terms of different levels of uncertainty in different P2P loan application groups. In the third essay (Chapter 4), we use up-to-date machine learning techniques to provide new evidence for the impact of ESG on the cost of equity. Using 15,229 firm-year observations from 51 different countries over the past 18 years, we document negative causal effects on the cost of equity. In addition, we uncover non-linear effects because the level of ESG effects on the equity cost decrease with the enhancements of ESG performance. Furthermore, we note the heterogeneity in ESG effects in different regions by breaking down our sample. Finally, we find that global crises change the sensitivity of the equity cost towards ESG, and the change varies in areas

    Machine Learning methods in climate finance: a systematic review

    Get PDF
    Evitar la materialización del cambio climático es uno de los principales retos de nuestro tiempo. En esta tarea, el sector financiero desempeña un papel fundamental, motivando a economistas académicos a desarrollar un nuevo campo de investigación, las finanzas climáticas. A la vez, el uso de tecnologías de aprendizaje automático (ML, por sus siglas en inglés) se ha popularizado para analizar problemas relacionados con las finanzas climáticas, debido principalmente a la necesidad de gestionar un volumen elevado de datos relacionados con el clima, y para modelizar relaciones no lineales entre variables climáticas y económicas. De esta manera, proponemos una revisión de la literatura académica para explorar cómo esta tecnología está posibilitando el crecimiento de las finanzas climáticas. Para ello, primero realizamos una búsqueda sistemática de estudios en esta materia en tres bases de datos científicas. Luego, usando un modelo de identificación automática de temas (Latent Dirichlet Allocation), identificamos estadísticamente siete áreas del conocimiento donde el ML está desempeñando un papel relevante: catástrofes naturales, biodiversidad, riesgo agrícola, mercados de carbono, energía, inversión responsable y datos climáticos. Para finalizar, hacemos un análisis de las principales tendencias de publicación, así como una clasificación de los modelos estadísticos utilizados en función del área de estudio. La principal contribución de este artículo es la provisión de una estructura de temas o problemas solventados gracias al uso del ML en finanzas climáticas, lo cual esperamos que facilite a expertos en esta tecnología la comprensión de las principales fortalezas y limitaciones de dicha tecnología aplicada en este campo de investigación.Preventing the materialization of climate change is one of the main challenges of our time. The involvement of the financial sector is a fundamental pillar in this task, which has led to the emergence of a new field in the literature, climate finance. In turn, the use of Machine Learning (ML) as a tool to analyze climate finance is on the rise, due to the need to use big data to collect new climate-related information and model complex non-linear relationships. Considering the proliferation of articles in this field, and the potential for the use of ML, we propose a review of the academic literature to assess how ML is enabling climate finance to scale up. The main contribution of this paper is to provide a structure of application domains in a highly fragmented research field, aiming to spur further innovative work from ML experts. To pursue this objective, first we perform a systematic search of three scientific databases to assemble a corpus of relevant studies. Using topic modeling (Latent Dirichlet Allocation) we uncover representative thematic clusters. This allows us to statistically identify seven granular areas where ML is playing a significant role in climate finance literature: natural hazards, biodiversity, agricultural risk, carbon markets, energy economics, ESG factors & investing, and climate data. Second, we perform an analysis highlighting publication trends; and thirdly, we show a breakdown of ML methods applied by research area

    Case Studies of Environmental Risk Analysis Methodologies

    Get PDF

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Solving Multi-objective Integer Programs using Convex Preference Cones

    Get PDF
    Esta encuesta tiene dos objetivos: en primer lugar, identificar a los individuos que fueron víctimas de algún tipo de delito y la manera en que ocurrió el mismo. En segundo lugar, medir la eficacia de las distintas autoridades competentes una vez que los individuos denunciaron el delito que sufrieron. Adicionalmente la ENVEI busca indagar las percepciones que los ciudadanos tienen sobre las instituciones de justicia y el estado de derecho en Méxic
    • …
    corecore